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(54) Titie: A SIGNAL PROCESSING METHOD TO ANALYSE TRANSIENTS OF SPEECH SIGNALS 



(57) Abstract 

The present invention is related to a method and an 
apparatus for determination of a parameter of a system generating 
a signal containing information about the parameter. The method 
comprises die step of short time Laplace transfonning the signal 
and may be utilised for classifying tiie system in question in 
accordance with one or more determined parameters into one class 
of a set of predefined classes defined by predetermined ranges 
of values of the parameters. The invention also relates to the 
use of a shape of energy changes of a signal for identifying or 
lepresenting features of the system generating the signal. This 
use may be applied to recognition of sound features perceivable 
by e.g. a human ear as representing a distinct sound picture. It 
has for example been found that the signal information relevant to 
recognition of speech is present in a transient part of die speech 
signal. 
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A SIGNAL PROCESSING METHOD TO ANALYSE TRANSIENTS OF SPEECH SIGNALS 



The present invention relates to a method for determination of a 
5 parameter of a system generating a signal containing information 
about the parameter. 

The method may be used for identification of sound or speech 
signals, such as in speech recognition; or for quality measurement 
10 of audio products or systems, such as loudspeakers, hearing aids, 
telecommunication systems, or for quality measurement of acoustic 
conditions. The method of the present invention may also be used in 
connection with speech compression and decompression in narrow band 
telecommunication . 

15 

The method may also be used in analysis of mechanical vibrations 
generated by a manufactured device during operation e.g. for 
defection of malfunction of the device. 

20 The method may further be used in electrobiology for example for 
analysis of neuroelectrical signals such as analysis of signals 
from an electroencephalograph, an electromyograph. etc. 

BACKGROUND OF THE INVENTION 

25 

Prior art methods of signal processing are based on a short time 
Fourier transform of signals and it is assumed that the signals are 
steady state signals. 

30 In steady state analysis the signal is assumed stationary in the 
period the signal is analysed and the steady state spectrum is 
calculated . 

m real life steady state signals do not occur and steady state 
35 analysis does not provide sufficient knowledge of phenomena within 
various scientific and technological fields. Consider for example 
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speech analysis. The human ear has the ability to simultauieously 
catch fast soxind signals, detect sotmd frequencies with great 
accuracy and differentiate between sound signals in complicated 
soxind environments. For instance it is possible to xinderstand what 
5 a singer is singing in an accompaniment of musical instruments. 

It is assumed that the cochlea in the human ear can be regarded as 
comprising a large number of band-pass filters within the frequency 
xange of the human ear. 



10 



The time response f (t) for one band-pass filter due to an 
excitation can be separated into two components, the transient 
response, ft(t), and the steady state response, fs(t), 
f {t)=ft(t)+fs(t) . 



15 



Traditional signal processing is based on the steady state response 

(t), auid the transient response ft(t) is assumed to vanish very 
fast and to be without importance for the perception, see for 
example "Principles of Circuit Synthesis", McGraw-Hill 1959, Ernest 
20 5. Kuh and Donald O. Pederson, page 12, lines 9-15, where it is 
stated that : 



"only the forced response is considered while the response due to 
the initial state of the network is ignored" . 

25 

Thus, when students are introduced to the world of signal analysis, 
they learn that the transient response, i.e. the response due to 
the initial state of the network should be ignored because it 
vanishes within a very short period of time. Furthermore, it is 
30 rather difficult to analyse these transient signals by use of 
traditional linear methods of analysis. 



The ability of the human ear to hear very short sounds and at the 
same time detect frequencies with great accuracy is in conflict 
35 with the traditional filterbased spectrum analysis. The time window 
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35 segmentation. The transient analysis of ^^^^ 
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4 



much different to the transient analysis of the present invention 
which is based on a direct transient detection in the time domain.. 
SUMMARY OF THE INVENTION 



principle from all knovm methods for processing signals. The 
approach taken and some of the results obtained will be explained 
by of an example in the context of analysis of speech signals. 

10 Speech is produced by means of short pulses generated by the vocal 
chords in the case of voiced speech and by friction in the vocal 
tract in the case of unvoiced speech. The pulses are filtered by 
the vocal tract that acts as a time-varying filter. The output 
response will consist of quasi steady state terms and also 

15 transient terms. The quasi steady state terms will only be damped 
slightly in the period before the next pulse is generated. The 
trauisient terms will be sufficiently damped in the time period 
before the next pulse is generated. 

20 The speech signal is often assumed to have only quasi steady state 
terms in the period or time window of the analysis, typically 20-30 



The placement of formants, the formants being energy bands in the 
25 short time power spectrum, are calculated by means of a short time 
spectrum analysis has previously been assumed decisive for speech 
intelligibility, together with voiced/unvoiced detection, the pitch 
and the quasi steady state power. 

30 However, a number of observations, which has been performed within 
the field of auditory perception research, does not conform to the 
previous assumptions : 

Why is it possible to understand and identify a deep male voice 
35 through communication channels that have a higher cut-off frequency 
than the male pitch. 



5 The present invention provides an approach which is different in 



ms . 
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The only difference between the pronunciation of the letters: e, t), 
d is in the first 1-3 ms of the voice signal and this information 
will be lost if the analysis have a time window of 20-30 ms. 

5 

How can the absolute placement of these formants be decisive when 
their placement is quite different for different people, 
particularly between small children and large males. 

10 Why is distortion dominated by odd order harmonics and caused by 
cross-over distortion in a class B amplifier much more disturbing 
than distortion dominated by even order harmonics caused by 
amplitude distortion in a class A amplifier. 

15 The short time power spectrum will not distinguish frec[uencies from 
different sources, and tones generated by other sources than the 
speech signal will act like false formants. 

Why does a signal consisting of three tones with the same 
20 frequencies as the formants for a vowel not give the slightest 
perception of the vowel at all? The signal just soxmds like three 
separate tones . 

Why is the ear very sensitive to frequency changes of a signal up 
25 till cO^out 1000 Hz, changes of +/- 3 Hz can be detected. For 
frequencies above 1000 Hz, the sensitivity is much smaller. 



The research performed by the present applicant leads to suggest 
that the ear is tone dominant imtil about 1.4 - 1.6 kHz and 

30 transient dominant above. Tone dominant means that the pulses 
launched from the hair cells as a response to a tone signal are 
synchronised to the tone signal. Transient dominant means, in the 
present context, that the hair cells are activated by changes of 
the energy with rise and fall times of at most 2 ms typical caused 

35 by trcuisient pulses . 
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Regarding speech signals, ic is assumed that the quasi steady statue 
terms are in the tone dominant interval of the ear and that the 
transient terms are in the transient dominant interval. It is 
believed that the transient terms are very important for speech 
5 intelligibility. The transient terms are seen as transient pulses 
in the speech signal. The rise time and the shape of leading and 
lagging edges of the envelope of transient pulses in the terms of a 
profile of damped frequencies describes the sound picture. The 
shape of the leading and lagging edges, the dynamic changes, chcuigre 
10 of amplitude, of the transient pulses, voiced/unvoiced detection 
and the changes of pitch are decisive for speech recognition. 

This approach provides a number of advantages with respect to 
explaining the earlier mentioned speech perception observations. 

15 

A natural explanation as to why it is possible to understand axtdi 
identify a deep male voice through communication channels that have 
a higher cut-off frequency them the male pitch is provided. The 
pitch can be detected as the period between transient pulses. 

20 

The absolute placement of f ormants is not decisive . The damped 
frequencies profile of the shape of the transient pulse envelope is 
dominated by danced difference frequencies of the transient terms. 

25 Distortion caused by cross-over distortion in a class B amplifier 
generates abrupt energy changes (unwanted transients) which are 
much more disturbing than distortion caused by amplitude distortion 
in a class A amplifier which do not generate the same abrupt energy 
changes . 

30 

Robust data- or telecommunication is based on modulation. The 
envelope of transient pulses is a kind of amplitude modulation, 
transient or impulse response modulation, and will have the same 
advantages . 
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It is unlikely chac frequencies from other sources will cause 
interference patterns with the speech signal that gives energy- 
changes with time constants and shapes in the rcuige that is 
decisive for speech intelligibility. This means that transient 
5 modulation will be robust in noisy environments and communication 
channels . 



The ear is probably very sensitive to changes of a frequency up 
till about 1000 Hz because the nerve pulses are synchronised to tiie 
10 frequency and the period between the pulses is a measure for the 
frequency. In the high frequency range, where the pulses are not 
synchronised to the frequency, only placement of the frequency in 
the cochlea is a measure for the frequency. 



15 According to the invention it has for example been foimd that the 
signal information relevant to recognition of speech is present in 
a transient part of the speech signal. Thus, the method of the 
present invention may involve a separation of the tremsient part of 
an auditory signal, a generation of a transient pulse corresponding 

20 to the transient part, and analysis of the shape of the pulse. In 
an auditory signal, the corresponding transient pulse may be 
repeated with time intervals, and the time interval of these 
periodic transient pulses is normally also analysed or determined. 

25 In real life, the human ear reacts to energy changes at high 

frequencies in order to recognise phonemes or sound pictures. But 
in the present method transient pulses corresponding to the energy 
changes observed by the ear are extracted at these high 
f recjuencies , wherefore the transient pulses preferably are 

30 transformed to the low frequency range still maintaining the 

distinct features of the sound pictures or phonemes. Thus, by using 
the principles of the invention, it is possible to obtain distinct 
features within auditory signals by examining the transformed low 
f r equency s i gna 1 s . 
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The invention relates to the use of the shape of energy changes of 
a signal for identifying or representing features of the system 
generating the signal for example in recognition of sound features 
which can be perceived by an animal ear such as a human ear as 
5 representing a distinct sound picture are determined. 

The method of the present invention provides an expression for the 
transient conditions of the auditory signal. The method comprises a 
band-pass filtration of an auditory signal within the frequency 
10 range of the hioman ear and a detection of a low-pass filtered 

envelope, which envelope then can be analysed with known methods of 
signal analysis. The envelope is an expression of the transient 
part of the signal. 

15 The method of signal analysis, which should be used when analysing 
the envelope, and the characteristics of the band-pass filter, 
which should be selected, will depend on the purpose of the 
analysis. The purpose may be speech recognition, quality- 
measurement of audio products or acoustic conditions, and narrow 

20 band telecommunication. 

The invention also relates co a system for processing a signal to 
reduce the bandwidth of the signal with substantial retention of 
the information of the signal. The system may further comprise 
25 means for extracting the transient component of the auditory 

signal, and it may comprise means for detecting an envelope of the 
transient component . 

A signal may be separated into a sum of impulse responses generated 
30 by poles and zeroes in the system that has generated the signal, if 
the time between the excitation pulses are sufficient long compared 
to the duration of the impulse responses for the system. 

In WO 94/25958 it is shown chat the envelope of the transient 
35 component in a speech signal is very important for its recognition 
and it is shown that the envelope of the impulse response will 
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contain exponential functions and difference frequencies defined t>y 
the impulse response . 

A method based on damped sinus functions to extract important 
5 features from the envelope signal is described, and examples where 
the method is used on speech signals shows that the features are 
important in speech analysis. 



Before entering into a more detailed explanation of features of the 
10 method of the invention, a few definitions will be given: 



In short time analysis the transient component in a signal is a 
matter of definition. For auditory signals, the idea is to obtain 
an expression that gives a response corresponding to the response 

15 in the cochlea to an abrupt change in the signal energy. An abrupt 
change in the signal energy corresponds to the transient component 
in the auditory signal. Thus, in the present context, the term 
"transient component" designates any signal corresponding to an 
abrupt energy change in an auditory signal . The transient component 

20 holds the signal information to be analysed and in order to analyse 
this information the transient component may be transformed to a 
corresponding transient pulse having a distinct shape. Thus, in the 
present context, the term "transient pulse" refers to a pulse 
having a distinct shape and substantially holding the information 

25 of the trsuxsient component of the auditory signal and thus 

corresponding to an abrupt change in the energy of the auditory 
signal. As mentioned above the transient part of a sound signal may 
be repeated with time intervals and thus, in the present context, 
the term "periodic" when used in combination with a transient 

30 component, response or pulse designates any transient component, 
response or pulse being repeated with intervals. 

The term "shape" designates any arbitrary time -varying function 
(which is time-limited or not time-limited) and which, within a 
35 given time interval Tp has a distinctly different amplitude level 
in comparison with the amplitude level outside the interval. Thus, 
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T is the duracion of the shape function when the shape fxonction is 
time -limited, or the duration of the part of the function which has 
a distinctly different amplitude level in comparison with the 
amplitude level outside the time interval. 

5 

In order to extract information from the shape of the energy 
changes, one broad aspect of the invention relates to represent the 
shape of the energy changes by the short time Laplace transform of 
a transient pulse of the signal. However, several methods can be 
10 applied in order to obtain a transient pulse corresponding to the 
change in energy, but it is preferred that an envelope detection is 
being used, where the envelope preferably should be detected from a 
transient response of the energy change in the auditory signal. 

15 The energy change representing the distinct sound picture can be a 
phoneme or vowel or any other sound which gives a sudden energy 
change in an auditory signal. 

It is also an aspect of the invention to provide a method for 
20 identifying, in an auditory signal, energy changes which can be 
perceived by an animal ear such as a human ear as representing a 
distinct sound picture, the method comprising comparing the shape 
of energy changes of the signal with predetermined energy change 
shapes representing distinct soxand pictures. For the identification 
25 it is preferred that the shape of the energy changes are 

represented by the shape of a transient pulse of the signal, and it 
is furthermore preferred that the shape of the transient pulse 
should be obtained by an envelope detection of a transient response 
of the energy change in the auditory signal. 

30 

The invention also relates to a method for processing a signal so 
as to reduce the bandwidth of the signal with substantial retention 
of the information of the signal, comprising extracting a transient 
part of the signal . The method may further comprise detecting an 
35 envelope of the transient part of the sxgnal. 
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Known methods of processing signals are based on a short cime 
Fourier transform of signals, and it is assumed that the signals 
are steady state signals. 

5 In steady state analysis the signal is assumed stable in the period 
the signal is analysed, and the steady state spectrum is 
calculated. 



In WO 94/25958 it is disclosed that transient pulses are important 
10 for speech coding and decoding in narrow band communication, for 

speech recognition and synthesis, cuid for sound quality in auditory 
products (i,e. loudspeakers, amplifiers and hearing aids) . 



An important part of a transient signal is the exponential 
15 functions or damping ratios or time constcuits. The damping ratio is 
the reason that the impulse response has a finite duration. The 
fact that the tratnsient signal is important for auditory perception 
indicates that the response from the hair cells is dependent on tlie 
time constants. If this is the case, it is possible that the 
20 damping ratios in the response from nerve cells in general are 
important for the human nerve system - 



Transient signals are also important in many other applications, 
among others signals generated by impacts from defects in rolling 
25 bearings and gear-boxes. 

Based on the transient signal, it is possible to determine the 
natural time constants and frequencies in the system generating the 
signal. Further it is possible to determine the excitation pulses 
30 of the system. 



BRIE? DESCRIPTION OF THE DRAWINGS 



Fig. 1 shows a time -domain representation of a linear time- 

35 invariant system, 
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Fig. 2 shows the impulse response of a Butcerworth low-pass 

filter of 3. order and a cut-off frequency at 700 Hz, 

Fig. 3 shows the response with the filter relaxed for 

5 /< 0 and with a 4000 Hz tone as input at r^O, 

Fig. 4 shows the s-plane with poles and the zero for //(a,<o). 



shows //(a, CO) for co, and co . analysed parallel with the or 
axis , 

shows transient characteristics in speech signals. 
Figs. 7-12 show processed speech signals, 

15 

Fig. 13 shows a schematic of a filter bank according to the 
present invention. 

DETAILED DESCRIPTION OF THE DRAWING 

20 

The importance of the transient part of a signal has been an 
overlooked phenomenon in signal analysis. 

The response of a linear system to either an impulse or a step 
25 function is defined by its transient response properties. 



Fig. 



10 



Fig. 6 



The relationship between the input and the output for the linear 
time- invariant system shown in Fig. i can be written as the 
convolution of the input signal and the impulse response of the 
30 system: 



v..(/) = \vi{x)h{t ~ x)cbc 



(1) 
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If the system is initially relaxed and the input signal Vi(r) is 
zero for /< 0 then the lower integration limit of Eq . (1) can be 
replaced with zero. Eq. (1) then shows the important role played by 
the impulse response in terms of the actual signal processing that 
5 is performed by the system. It states that the input signal is 

weighted or multiplied by the impulse response at every instant in 
time and, at any specific point in time, the output is the 
summation or integral of all past weighted inputs. 

10 The impulse response of a real system has a finite duration and the 
transient response has the same duration- Fig. 2 shows the impulse 
response of a Butterworth low-pass filter of 3. order and a cut-off 
frequency at 700 Hz. Fig. 3 shows the response with the filter 
relaxed for /< 0 and with a 4000 Hz tone as input at />0. 

15 

In many processes V/(/) v/ill be =» pulse with a short duration and 
Vi(/)sBO before the next pulse will be generated. 

The Laplace transform of a signal v(/) is defined by 

20 

OO 

Lis)^ Jv(r)e"'^V/ (2) 

0 
m 

= Jv(/)e J/ 

0 

25 If v(0 is the impulse response h{t) for a system with 2 complex 
poles 

, / > 0 (3) 

30 and 0 for /<0 and .9 9* -(c^, ± yco^ ) . 
the Laplace transform is 
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5+Oo 



(S -KJ o+yCO o)(^ +Cr o-j(0 o) 



or 

5 



//(ceo) - -Hao+7(€«> ->-<^o))(«^ +<^o+7(<»^ -^o)) 



From Eq. (4) it is seen that for (a ,co ) (-o o»^^ o) ' //(a,co) ±oo . 

10 

This is a well-known phenomenon and a logical consecpaence of this 
is as follows: 

If the signal analysed is dominated by the impulse response of the 
15 system generating the signal, it is possible to determine the 
natural time constants and frequencies for the system. 

Fig. 5 shows a plot of H(a,co) for co =0) , and co=co2- 

20 Analysing a signal along or parallel with the jo) axis will give a 
frequency profile for a given a. 

Analysing a signal along or parallel with the a axis will give a 
time constant profile for a given jco, 

25 

If a signal has a time constant profile with significant variations 
for specific frequencies, the signal is transient dominated. 
Opposite if the signal does not vary significantly for any 
frecpiency, the signal is steady state dominated. 

30 

A short time Laplace transform is defined by: 
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I 

Uo.iS^J) = Jv,(/ (5) 



in which v, is the signal, L is the transformed signal, a is a time 
constant, and co is an angular frequency. 

5 

It is not possible to calculate the short time Laplace transform in 
the same way as DFT in the discrete time domain because two 

arbitrary exponential fiinctions, e'" and are not orthogonal 

with respect to each other. 

10 

The short time Fourier analysis in the analogue time domain is 
based on a filter bank method. In this paper an equivalent method 
will be developed for the Laplace transfoim. 

15 

From Eg. (1) and £q. (3) : 



Vo(/) = Jv.(/ - A.)e-'"*^">^<iX 

0 

20 

+ Jv.(/-X)c '"^^iT. (6) 

0 

v„(0 = K(a,a>,/) + F*(CT,oc»,/) = m(/) + m'(/) 
25 where i/'(/)is the complex conjugate of and we have 

Re[M<y,w,/)] = 4v„(0 (7) 
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From Eq.(6) and Eq.{7) ic is seen that filtering the signal v-(/) by 
a filter with the impulse response /?(a,co,/) with 2 complex poles 
will represent the reel part of the short time L(a.co,/) transform. 

5 If we let v,.(r) be equal to the impulse response of a single pole we 
have 



u{t) = j^e-^"""^*'"^'"^*^"*^*'^^^^ 

0 

10 



r 



15 

and from Eq. (7) we have 



2k(a -CToXe""' cos( o)/)-g'°'^ cosCcOq/)) 
2° "-^'^^ (a-ao)^+(co-a».)^ 



2i fc(o> -o>o)(e"" sin(<i)/)-g"'''' sin(a)o/)) 
(O -Co)* +(o> -«>o) 

or 



v..(0 e'^-'da -CTo)cos(a>o/)-(o>-cDo)sin(cOo/)) 
2* (a -Go)' +(to-o>o)^ 
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a ((a - a 0 ) cos(co/ ) - (co « 03 ^ ) sin(co/ )) 

5 — : ' (9b) 

(a -Go) +(co-(oj- 



5 Eq, (9) is not defined for (a,©) = (ao^cOo ) but from (8) we have in 
this case 



10 =kte 



and. 



v„ (r) = Ikte"-' cos(©o/) < ) 

15 



and we have v„(r)— for / — > oo . 



Eq. (9) shows that the gain is inversely related to CT — and 

20 CO— cUq, and when (Oo.cOg) is far from (a, to) and e~°' —e'""^ is small, 

v„(0*0- FoJ^ (CTo'^^o) *~ will have Eq. (10) as the limit. It 

is not immediately to see if Eq. (9) has the maximum energy for 

(Co,cao)"<-(o,Ci)) . 
25 In the DC domain Eq. (9) can be written as 



rjD^lk^ 



(11) 



O -Co 



30 The maximum for \*„(/) can be found as follows 
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dv„ 1_ 
when 



5 , ^log(a)-log(ao) ^^2, 



and Eq. (11) will have the maximum for this value. 
It can be shown that /„, -> ^ when a — ^Cq. 

10 

When aKQo we will have the approximated maximum with 



15 

From Eq. (13) it can be shown that 



-> for a —►Co 

20 

In Eq. (11) e'^"* represent the signal to be analysed and c"*'' the 
filter. Table 1 shows the result with a filter having 
a = 100 s ' and the signal varying from 1 to 10000 s * 

25 It is not surprising that the convolution acts as a low-pass 

filter. The important fact is that the exponential function in the 
DC domain in some way acts as frequencies do in the frequency 
domain . 
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In table l v^^,(/,„) is the result of a convolution where the signal 
is differentiated. The result is, as expected, a high-pass filter 

If we look on Eq. (9a) without exponential functions it can be 
5 written as 



2 k (sin(C)0/)-sin((Oo/)) 



10 it is seen that for a> — ► oo we will have — > 0 . 



a : 


100 s - 










V«(/«,) 




s * 
1 


s 

0, 046516871 


0,954548457 


0, 009545485 


10 


0,025584279 


0,774263683 


0, 077426368 


100 


0, 010000000 


0,367879441 


0,367879441 


1000 


0, 002558428 


0,077426368 


0, 774263683 


10000 


0, 000465169 


0,009545485 


0, 954548457 



Table 1 v,(/|„)is given by Eq. (11, 12) and 
15 normalised by a and 2k. v^,,(r,„)is a 

convolution where the signal is 
differentiated and normalised by 2k. 

For Q> «0)o we will have 

20 



2/c(sin(ci>/) - sin((OoO) 
v\, = — (15) 

<0n 



It can be shown that for O)— we will have 
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v„(/)^2/:/cos(a>o/) 



(16) 



5 This result is as expected unstable. 

In transient analysis only the beginning of the signal is of 
interest, and if cOq » 1 Eq. (14) will act as a band-pass filter. 

10 Speech processing is based on fast energy pulse generated by the 
vocal cords or by friction in the articulation channel weighted by 
the impulse response in the articulation channel. The rise time for 
the excitation pulses has to be sufficient faster than the rise 
time of the energy of the impulse response. 



The shape of energy pulses are important features in speech. If the 
time between the pulses is periodical it is voiced speech, and if 
not it is unvoiced speech. For some phonemes abrupt changes in the 
energy pulses are importauit . 



Prom wo 94/25958 it is known that the shape of the energy pulses 
are important for speech recognition, especially the leading edge. 
In the following a method to extract features will be developed 
based on an envelope detection. 



The convolution expressed in Eq. (9) can be regarded as a response 
from 2 poles in the articulation channel excited by an impulse. If 



15 



20 



25 



a^^a we have from Eq. (9a) 



30 
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The envelope is defined as 



e(r) = vV (/) + «-(/) 
5 where 

1 

u(t) = «(/)*— 

71/ 

is the Hilbert Transform - 
10 The envelope of Eq. (17) is then 



^AO-'i r J(sin(co/) - sin(cOo/))^ + (~cos(cd/) + cos(o)o/))^ 



15 =-j r^2(l--cos(a> -ci>o)0 

|(0 -<0o| 



(l-+COS((CO-COo)/)) (18) 



|(l> -COo 



20 



The approximation is legal because |cos((g) -cOo)/)j < 1 



As expected the envelope has a component with the difference 
frequency of the 2 frequencies. 

25 

The conclusion is that we can expect to find damped difference 
frequencies in the envelope of the transient component. 
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To detect the damped difference frequencies a filter bank is used 
The features might be detected as a convolution between the 
transient pulse and the impulse response of the filters. 

5 In general form the impulse response can be written as 

h{n = ke'^ sin(/(X)/+<t>) 
Where a = /w and (O = f(X) . 

10 

In the following analysis f(X)^l3X, A=<i)=lJ>w, and ^ = 0 are 
selected and we have 



15 hit) = \5Xe''-' sin(lJX/) (19) 

By selecting co = 15c7 Eq. (19) will act as a band-pass filter with a 
low Q in relation to the f recjuencies . Other ratios (d/c than 1.5 may 
be selected and it is presently preferred that the ratio (co/a) 
20 ranges from 0.5 to 2.5. The exponential fxinction gives the advance 
that it acts like natural time window that ensure that the signal 
is natural damped. The value of the parameters are selected by 
studying rise times in important transient pulses and by 
experiments . 

25 

Pig. 6 shows transient characteristics in speech signals. The top 
figure shows 50 ms of an '^a" in '"hard key" pronounced by a female. 

The second signal is a band-pass filtration of the speech signal. 
30 The band-pass filter is a Butterworth filter with 6 poles and a 
band width from 2150 to 3550 Hz. This frequency band contains 
important transient pulses in the sensitive freqfuency interval of 
the ear. 
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The third signal is a energy detection of the transient 
characteristics of the band-pass filtered speech signal. The 
detection is an envelope detection performed by means of a 
rectification and a low-pass filtration of the signal. The filter 
5 is a Butterworth filter with 3 poles and a cut-off frequency at 7O0 
Hz . 

In WO 97/09712 a method for automatically detecting the leading 
edges is disclosed. The method uses the maximum slope of the 
10 leading edge as reference, and the point before the maximum slope 
where the slope is less than a given threshold (10-20 % of the 
maximum slope) the leading edge is defined to begin. 



The transient (envelope) signal in Fig. (6) has a DC component, 
15 which does not contain any information. Therefore it is preferred 
that the signal is differentiated before it is analysed e.g. by the 
filter bank shown in Fig. 13. 

In Fig. 13, the filters {hi(t), haCt) h„(t)) in the filter bank 
20 connected between the input and the envelope detectors are band- 
pass filters having bandwidths corresponding to the bandwidths of 
the band-pass filters of the cochlea and having centre frequencies 
ranging from 1400 Hz to 6500 Hz. 

25 The output signals oij (p) from the filter bank shown in Fig. 13 is 
calculated by: 



30 



i-0, 1,>.,N-1 
j = 0, 1 M-l 



p < 0 



p=0, 1 P-1 
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m=0, l,.^,M-i and M is che number of band-pass filters with a low Q 
in the filter bank connected between the outputs and the envelope 
5 detectors, p = o,l,...,P-l is the sample number, t' is the 
differentiated transient signal, and X„, is the filter bank 
parameter and it is normalised by the sampling frequency. 

In the analysis M is selected to 10 and 1500 < < 12000 s'\ X\ is 
10 not normalised. By this we have 1885 <co„, < 18850 s"^ or 300</„<300O 
Hz. 

This filtering process is not done in the cochlea but in the hair 
cells or in the nerve system behind the hair cells. 

15 

The Figs, 7, 8 , 9, 10, 11, and 12 show the output of the 
processing of transient signals in the vowels '^a" , ^^o" , «i* in 
*hard key" and ^soft key" pronounced by a female and a male. 
Further the figures show plots of maxima of the output signals as a 
20 f\anction of the time constant of the corresponding filter. 

The figures show that maximum curves are very much alike for the 
same vowels, independent of whether a female or male pronounces it. 

25 With a library of templates and a distance measure it is possible 
to identify the sound picture, and it can be used for speech 
recognition and narrow band comm\inication. 

Thus, according to the invention a method and an apparatus are 
30 provided for determination of a parameter of a system generating a 
signal containing information about the parameter, in which the 
signal is short time transformed substantially in accordance with 
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in which Vi is the signal, L is the transformed signal, a is a time 
constant, co is an angular frequency, and is a phase, or, in 
accordance with another transformation which will give rise to an 
5 L' (a,co,t) which in time intervals within which L(a,(o,t) is larger 
than 10% of its maximum value is not more than 50% different from 
the result given by the short time Laplace transformation. 

In narrow band communication the transient pulses have to be 
10 identified and coded, and the decoder will contain a library of 

filters with corresponding transient responses . The decoder library 
could also contain the transient responses. 

The present invention also relates to measurement of mechanical 
15 vibrations e.g. when testing devices that generate mechanical 
energy during operation, such as mecheuiical devices with moving 
parts, such as compressors for refrigerators, electric motors, 
household machines, electric razors, combustion engines, etc, etc - 

20 For example, it is known that measurement of vibration generated or 
sound emitted by a device during operation can be useful for 
detection of malftinction of the device. Certain failures may 
generate sound or vibration of specific characteristics that can be 
recognised. 

25 

The method may also comprise steps of classification for 
classifying a tested device in accordance with the determined 
parameters into one class of a set of predefined classes. Each 
predefined class may be defined by a set of upper and lower limits 
30 for specific parameters determined according to the method. A 
device may then be classified as belonging to a certain class if 
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its corresponding parameter values lie within corresponding upper 
and lower limits of the class. 

Each class may correspond to a specific type of failure of the 
5 device. For example, shaft imbalance, wheel imbalance, crookedness, 
imperfections of teeth in cogs, tight bearing, loose bearings, etc, 
may cause the device to vibrate in different characteristic ways, 
whereby a characteristic mechanical vibration or soimd is generated 
for each type of failure. The type of failure of the device may 
10 then be detected by comparing determined device parameters with 
corresponding parameter values of various predetermined classes . 

The upper and lower limits of a specific class of devices may be 
determined by testing a set of devices known to belong to that 
15 class. For example, the upper limits may be determined as the 

average of specific parameter values plus three times the stsuidard 
deviation. Likewise, the lower limits may be determined as the 
average of parameter values minus three times the standard 
deviation. 

20 
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CLAIMS 

1. A method for determination of a parameter of a system generating 
a signal containing information about the parameter, comprising tlie 

5 step of short time transforming the signal substantially in 
accordance with 

Z(a,a),0= Jv-(/ 

0 

in which Vj is the signal, L is Che transformed sigrnal.a is a time 
10 constant, (o is an angular frequency, and (p is a phase. 

2. A method according to claim l, wherein the step of transforming 
comprises filtering the signal Vi with a filter having a pole at a 
+ jcot and a pole at a - j(Ot. 

15 

3. A method according to claim 1 or 2 , comprising steps of 
trcmsforming the signal Vi for a plurality of sets of a and co 
values . 

20 4 . A method according to any of the preceding claims , further 
comprising the step of determining a maximum of at least one 
transformed signal L(a,(D,t). 

5. A method according to any of the preceding claims, further 
25 comprising the step of comparing transformed signals L with 

corresponding reference signals in order to determine parameters of 
the system. 

6. A method according to any of the preceding claims, further 

30 comprising a step of pre-processing the signal before the step of 
short time transforming, the pre-processing being selected from the 
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group consisting of filtering, rectification, differentiation, 
integration, and amplification. 

7. A method of transmitting a signal containing information of a 
5 set of parameters of a system generating the signal, comprising 
processing the signal according to any of the preceding claims and 
further comprising the step of transmitting the determined 
parameter values. 

10 8 . A method according to claim 7 further comprising the step of 
generating a copy of the signal from the transmitted parameter 
values . 



9. A method of transmitting a signal containing information of a 
15 set of parameters of a system generating the signal, comprising 

processing the signal according to any of the preceding claims eind 
further comprising the steps of 

comparing the signal with a library of sigxials generated for a 
20 predetermined set of parameter values by the system, 

selecting the library function that constitutes the best match to 
the signal , euid 

25 transmitting an identification signal that identifies the matching 
library function. 

10. A method *accor ding to claim 9, further comprising the steps of 
receiving the identification signal and generating the 

30 corresponding library signal. 

11. A method of classifying a system according to one or more 
parameters of the system generating a signal containing information 
about the one or more parameters, comprising determining the one or 

35 more parameters according to any of claims 1-6 and further 

comprising the step of classifying the system in accordamce with 



9948085A1J _> 



SUBSTITUTE SHEET (RULE 26) 



wo 99/48085 PCT/DK99/00128 

29 

the one or more decermined parameters into one class of a set of 
predefined classes defined by predetermined ranges of values of thie 
parameters . 

5 12 . A method for communicating an auditory signal, comprising 

processing the signal by the method according to any of claims 1-6, 
transmitting the processed signal, and receiving the processed 
signal by a receiver. 

10 13 . A method according to claim 12, wherein, prior to transmission 
of the processed signal, the signal is coded into a digital 
representation, and the coded signal is decoded in the receiver so 
as to reestablish transient pulse shapes perceived by an animal ear 
such as a human ear as representing the distinct sound pictures of 

15 the auditory signal . 

14. A method according to claim 13, wherein the digital 
transmission is performed at a bandwidth of at the most 4000 bits 
per second. 



20 



15 . A method according to claim 14 , wherein the bandwidth is at the 
most 2000 bits per second. 



16. A method according to claim 15, wherein the bandwidth is in the 
25 interval of 800-200 0 bits per second. 

17- A method according to any of claims 13-16, wherein a second and 
further pulses in a secjuence of identical pulses are represented by 
a digital value indicating repetition. 

30 

18. A method according to any of claims 1-6, comprising filtering 
the signal v^ in a filter bank comprising a plurality of band-pass 
filters interconnected in parallel wich centre frequencies ranging 
from 1400 Hz to 6 500 Hz, each of which is connected in series with 
35 an envelope detector and a filter bank comprising a plurality of 
low-pass filcers incerconnected in parallel and having cut-off 
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frequencies ranging from 3 00 Hz co 3000 Hz and time constants 
ranging from 1500 s"^ to 12000 s *. 

19. An apparatus for determination of a parameter of a system 
5 generating a signal containing information about the parameter, 
comprising a processor that is adapted to short time transform the 
signal substantially in accordance with 



10 in which Vi is the signal, L is the transformed signal, a is a time 
constant, co is an angular frequency, and <p is a phase. 

20. An apparatus according to claim 19, wherein the processor 
comprises a filter for filtering the signal Vi and having a pole at 

15 a + jcot and a pole at o - jot. 

21. An apparatus according to claim 19 or 20, wherein the processor 
comprises a plurality of filters for filtering the signal , each 
filter having a different set of a and to values. 



20 



25 



22. An apparatus according to claim 19, wherein the apparatus 
comprises a communication channel transmitter, and the processor is 
adapted to determine the one or several parameters of the system, 
and 

to transmit the one or several system parameters over a wireless or 
a cable communication channel. 
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